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Abstract 


This paper describes the use of composite estimation and state space 
modelling techniques for analysis of data from a repeating survey. The 
techniques take account of common sample between successive months and 
the resulting autocorrelation structure of the sampling error. The 
techniques are illustrated by an investigation of the effect of introducing 
telephone interviewing in the Australian Labour Force Survey. 


1 Introduction 


1.1 Techniques for analysing data from repeating surveys 


This paper describes the use of composite estimation and state space modelling 
techniques for analysis of data from a repeating survey. The techniques take 
account of common sample between successive months and the resulting 
autocorrelation structure of the sampling error. The techniques are illustrated | 
by an investigation of the effect of telephone interviewing in the Australian 
Labour Force Survey (LFS). 


The problem of measuring the statistical impact of a methodology change in a 
repeated survey is a general one. The textbook approach to measuring the 
difference between two methodologies is to conduct a ‘parallel run’ - a 
concurrent survey using the new methodology. This is often too expensive to be 
practical. In the LFS situation the new methodology was gradually 'phased in’ to 
the survey over seven months — this allowed estimation based on comparisons 
between portions of the survey using the different methodologies. 


Approaches to measuring the statistical impact of a methodolgy change must 
account for the effects of the new methodology alongside other effects on the 
estimates. Typically the units surveyed at a particular time point have different 
histories in the survey. Some units are newly selected while others were 
selected at various previous time points. This leads to autocorrelations between 
specific subsets of the sample at different times. These autocorrelations are 
specifically accounted for in both the composite estimation and state space 
modelling techniques. 


Section 2 describes the structure of the LFS and the data available to monitor the 
introduction of telephone interviewing. Section 3 describes an additive model 
for the error structure of the data that accounts for autocorrelations in the 
sampling error and for biases associated with how many times a group of 
dwellings has been surveyed. It describes estimation of these autocorrelations 
and biases based on data from before telephone interviewing was introduced. 
The section also introduces a number of models describing possible patterns of 
effect of telephone interviewing. 


Section 4 describes an approach to designing composite estimators for additive 
effects, and presents composite estimators of the effect of telephone 
interviewing. Section 5 presents a state space model for a time series 
incorporating sampling error and bias effects. This approach was found to be 
very flexible and practical in the analysis of telephone interviewing. The results 
of the analysis are presented in section 6. 


2 Measuring the impact of changes to the LFS 


2.1 The Labour Force Survey sample 


The LFS is a monthly household survey collecting information on the labour 
force status (employed, unemployed or not in labour force) of persons aged 15 
or older. The sample contains over 30,000 dwellings and over 70,000 persons. 
The survey is designed to give very accurate estimates — for example, estimates 
of the employed as a proportion of the civilian population aged 15 or over have a 
standard error of under 0.2 percentage points. 


The dwellings are selected by a multi-stage sampling scheme; geographic areas 
known as collectors’ districts (CDs) are selected, then dwellings are selected 
within these CDs. | The sample of CDs is divided into eight ‘rotation groups' 
(RGs). Each month, the dwellings in one RG are rotated out of the sample, and 
replaced with dwellings from the same CDs. Thus selected dwellings remain in 
the survey for eight months and are then replaced by nearby dwellings. 


2.2 Telephone interviewing 


The LFS has until recently used face to face (FTF) interviewing, but from August 
1996 telephone interviewing (TI) has been introduced. 


Under the new method, FIF is used for the first time a set of dwellings is 
selected, and TI at later stages. In practice, not all dwellings agree or are able to 
be interviewed by telephone, so a rotation group using TI will actually use a 
combination of telephone and face to face interviews. 


TI was introduced one RG at a time over the period August 1996 to February 
1997. Starting in July 1996, every dwelling entering the survey was interviewed 
the first time by FTF, but invited to be interviewed by telephone in succeeding 
months. This approach led to a gradual phase-in of TI, with one RG using TI in 
August, two RGs using TI in September, and so on. By February, seven of the 
RGs were using TI, the remaining RG being the one with dwellings entering the 
survey for the first time. The period Augsut 1996 to February 1997 is called "the 
phase-in period". 


2.3 Measuring the effect of telephone interviewing 


Experience within the ABS and in other agencies suggests that changing the 
mode of interview has the potential to affect responses — see for example 
Kormendi and Noordhoek .(1989) and Drew (1991). Published studies are not 
directly comparable to the Australian situation, and so there was little indication 
of what sort of effect could be expected. 


The phased introduction of TI in the LFS allows the difference in estimates 
between TI and FTF methods to be measured. The difference will be referred to 
as "the TI effect". An early measure of the likely size of any TI effect was of great 
importance to users of the statistics. 


The objective of this paper is to record methods used in analysing the effect of 
telephone interviewing on estimates of labour force status. The approaches 
used allow for considerable flexibility in modelling the TI effect and addressing 
issues such as whether the effect changed over time. 
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The analysis identified an effect on estimates from using the TI methodology 
over the early months of its introduction. The effect appears to have been 
transitory, with later months consistent with a zero effect. 


2.4 Analysis at rotation group level 


It was decided to evaluate the effect of TI by analysing estimates at RG level. 
Such estimates are sufficiently stable to allow standard methods of analysis of 
monthly data to be used. To analyse data at person level would be very 
complicated, particularly as even in a RG using TI the persons can elect to be 
interviewed FTF, and this choice may be related to labour force status. 


Because the analysis is at RG level, the analysis measures not the effect of the 
actual interview mode, but the effect of the mix of telephone and FTF interviews 
that results from using TI in a RG. This leads straightforwardly to measures of 
the effect on published estimates. 


2.5 Available estimates 


Monthly estimates of persons by labour force status were obtained for each RG, 
categorised by month, sex, age (grouped as 15-19, 20-24, ... , 50-54, 55-64, 
65+) and part-of-state (14 geographic regions covering Australia), 


Within these categories, the estimates for each RG were pro-ratéd to match 
known population benchmarks. This "benchmarking" ensures that each RG 
represents a similar mix of individuals. This is important since the intention is to 
compare population estimates derived from RGs using TI with the estimates 
from other RGs using FTF. 


Key estimates investigated were the proportion of the in scope population 
unemployed and the proportion employed, both by sex and overall. Data was 
available at RG level from January 1990 onwards. 


2.6 Components of the analysis 


Modelling the error structure 


Knowledge of the structure of the time series up to July 1996 allows modelling of 
the situation before TI was introduced. This provides a baseline for evaluating 
whether TI had any effect. So oh? ee 


A model describing the autocorrelation structure of rotation group estimates was 
fitted to the time series up to July 1996. All the estimates analysed were of rates 
Or proportions rather than counts, since rates were assumed to have a 
reasonably constant variance and autocorrelation structure over the period from 
January 1990 on. 


Designing a composite estimator 


Simple estimators of the TI effect can be based on a comparison of the TI and 
non-TI RGs at a single time point. Composite estimators using data from a 
number of months were developed, achieving a much lower variance than the 
simple estimators. The composite estimators are based on an additive model 
and a model for the autocorrelation structure of the series. 


Fitting time series models 


The autocorrelation structure can be used to specify the sampling error 
component of a state space model for the time series. An additive model was 
used which incorporates a parameter for the TI effect. 


This approach is very flexible, and was used to investigate various models for a TI 
effect that is not constant over time. Results from time series modelling were 
compared to those from composite estimation. 


3 Modelling the error structure 


The analysis depends on an understanding of the errors affecting the data, and 
especially the correlation structure of the sampling errors. The techniques for 
modelling and measuring these are described in this section. 


3.1 An additive model for the series 


This work uses an additive model that describes the series of estimates from 
each of the eight RGs. The series is decomposed into the true value of the item 
being measured, a bias associated with the survey procedures (in this case, TI or 
FTF interviewing), and a sampling error. 


Each RG at a given time ¢ is characterised by the number of times 7 the dwellings 
in the RG have been sampled. For example, in the RG with months-in-survey 
7 = 1 the dwellings are being sampled for the first time. 


Let y; be the estimated value for the RG with months-in-survey / at time ¢. Let 
Y, be the true value at time ¢. The following additive model at RG level 
incorporates sampling error e, and two sources of bias, b’ (the months-in-survey 
effect) and v (the TI effect). 


wi=¥i+b+R +e (1) 


3.2 A model for the months-in-survey effect 


Even before introducing TI there was an established months-in-survey effect on 
RG estimates. This effect captures the tendency for RGs containing dwellings 
sampled for more times to report somewhat lower’ employment -and 
unemployment (and higher numbers not in the labour force). A simple model 
assumes that the bias for months-in-survey / is constant over time at a value b/ 

known as the months-in-survey effect, and that the net effect of these biases 
across all values of j was 0. A simple estimate based on the mean over N time 
points before TI was introduced would be 


W=ZUAO-yr)  foryr=FELiN - (2) 


Table 1 shows estimates of this months-in-survey effect based on data from 
January 1990 to July 1996. It appears that estimates of both employment and 
unemployment tend to be lower for RGs with dwellings surveyed for more 
times. 


Table 1: Effect of months-in-survey on rotation group estimates 
Simple estimate (percentage points) 


LFS estimate Months-in-survey of rotation group 
0.15 0.08 0.05 0.01 -0.06 -0.09 -0.04 -0.09 
Proportion unemployed | 0.14 0.05 0.00 -0.01 0.00 -0.02 -0.07 -0.08 


These estimates were used to adjust for the months-in-survey effect in the 
composite estimation approach, as described in Section 4. In the state space 
modelling approach, values for the b’ parameters were estimated simultaneously 
with other parameters in the model — see Section 5. In using the state space 
modelling approach the months-in-survey biases can be allowed to vary through 
time. Given the limited span of data available this was not pursued, since the 
_ months-in-survey biases must be assumed to change slowly or not at all if 
changing biases over the phase-in period are to be treated as the effects of using 
TI. 


3.3 Models for the Tl effect 


The effect of the TI method on RG estimates is modelled as additional to the 
established months-in-survey bias. Denote by T, the change in bias that results 
from the TI effect for RG/ at time ¢ under TI. P, is zero for RGs not using TI, and 
possibly non-zero for RGs using TI. Four different models for the TI effect for 
RGs using TI are presented here. 


Constant effect model Mo 


In the first few months of use of the TI approach the main concern was to 
identify if there was any evidence of a significant TI effect. To this end a model 
was proposed in which every RG using TI was affected by the same additive bias 
T. In this model 7% =T. 


Monthly effect model M, 


It is also important to obtain some picture of whether the TI effect is changing 
with time. A simple approach to this question is to use a model with separate TI 
effects for each month. These can then, be inspected.to identify any patterns 
over the months. In this monthly effect model, 7; = 7; a separate constant for 
each time ¢. 


Two-level effect model M, 


From inspection of estimates from the monthly effect model, it became apparent 
that earlier months up to November 1996 indicated a TI effect but that in later 
months the effect appeared reduced. To address the issue of whether there was 
a significant change after November a two parameter model was used, in which 


i = Tyra. for t up to November 1996 
=T FINAL for t from December 1996 onward. (3) 
In this model a significant value of Tyna would indicate that the long-term TI 
effect was significantly different from zero. A significant value of 
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T per = Tinta — 7 Fnat WOuld indicate that the TI effect had changed significantly 
from the early months to the later months. 


Transitory effect model M, 


One further model was introduced which assumes a constant TI effect up to 
November 1996 which then decreased gradually to 0 in February 1997. This 
model can be written 7} =c;Trrans for c; taking the value 1 up to November 
1996 and decreasing gradually to 0 by February 1997. 


3.4 Modelling autocorrelated sampling error 


This section outlines the notation and calculations for measuring standard error 
and autocorrelations at RG level. It is a summary of methods presented in Bell 
and Carolan (1998). 


Basic structure of the sampling error 


A model is used to describe the sampling error structure of the rotation groups. 
The model assumes that sampling errors from different rotation groups have a 
common standard error and are uncorrelated. For the same RG, the correlation 
between sampling errors from different survey months ¢; and t2 depends only 
on the gap between the times |f2-¢,| and the number of occasions the 
dwellings in the RG have been selected (which determines if the two survey 
months have the same sample of dwellings). 


The sampling error is assumed to have E(e’) = 0, Var(e’) =802 and 
autocorrelations for lag k > 0,1<7<8and 1<_m <8 given by 


Corr(e,,e”",) = 0, ifm =j-—k +87 for integer 7>0 (ie. same RG) 
=0 otherwise (ie. different RGs) . (4) 


Estimating variance and autocorrelations 


The variances and the sampling error autocorrelations were estimated using 
N=79 months of RG estimates from January 1990 to July 1996. Separate 
estimates are obtained for each time in survey / and lag m, using the following 
formulae, based on Bell and Carolan (1998). 


Define the pseudo- error for the RG with dwellings sampled for the jth time to 
be é = =y)- b/ —y,. The unbiased estimate of variance for y; assuming variance is 
constant over time is then given by 


- $b g(a : 6) 
Estimates of the autocovariances of the pseudo-errors at lag R are given by 
eos N 
A=2 > (42 ee 0,1,2,. (6) 
t=k+1 


for integer 7 chosen so that 1< /—Rk+8/<8 


Estimates of the pseudo-error autocorrelations p/p, at rotation group level are 
given by 


Se ee 

Pre =Cp/Co - 7) 
It is straightforward to show that these estimates are biased as estimates for the 
true autocorrelations. In fact, their expected values are given PF 


ECD pisses Pen) = tO py: PAA, for A an 8 by 8 matrix with diagonal elements 2 2 
and off-diagonal elements = =z. This leads to using corrected estimates i, of the 


autocorrelation at lag & for estimates from the RG in survey for the jth time, 
given by 


(Opes Pe) = <a (Ppp, Ppp )A”! . (8) 
3.5 A model to smooth the autocorrelations 


The autocorrelation estimates are highly variable, this variability can be reduced 
by fitting to the estimates a model with a small number of parameters. Assume 
that the sampling error autocorrelation in a rotation group depends only on the | 
lag and on whether the RG has a common sample of dwellings between the two 
time points. This leads to the model 


0, = Pwe if 7>k (ie. same dwellings) 
= Pape otherwise (ie. different dwellings) . 2 (9) 


It would be possible to obtain estimates of pwe as an average of the estimates om 
for which 7 > &, and of pge as an average of the other lag R autocorrelations. An 
alternative is to smooth the autocorrelations using a model, so as to improve 
their stability and to enforce simple relationships, such as autocorrelations 
decreasing with increases in the lag. 


A four parameter model was introduced for the autocorrelations. The 
parameters ru, 7p, 9p and 9, have an interpretation in the context of the state 
space modelling that will be presented in section 5. Under this model the 
autocorrelations are given by 


= (1-ri)(Op'r3, + Os¢(1—r3)) and (10) 

Poe = (1-720, -r3). (11) 

A simpler three parameter model (with ry = 0) was eect because it did not fit 
the estimated autocorrelations well. ~ > 


The parameters were fitted so that for lags 1 to 8 the autocorrelations from the 
model were as close as possible to the estimated autocorrelations. Closeness 
was defined by least squares distance. The parameters were chosen to minimise 
the distance function 


pas St > 

De (Dp (Pe and Pwr)? + Lee(Me = Pze)”) . (12) 
The optimal parameters were chosen by a numerical search procedure, 
restricting ry, rp, and @p to be in [0, 1] and restricting Ogto [0.94, 0.98]. (This 
restriction ensures that any long term autocorrelation pzz decreases slowly with 


increasing lag. Some restriction is necessary to ensure a single best solution in 
situations where Page is near zero.) 


The resulting smoothed autocorrelation estimates are presented in table 2 along 
with estimated standard errors. ; 


Table 2: Estimated standard errors and rotation group autocorrelations 


Lag Proportion unemployed Proportion employed 
(months) og =0.11 (% points) Og =0.21(% points) 


Standard error is o¢ 

Autocorrelations between estimates from the same RG at lag R are pwe within a 
set of dwellings and pgz between different sets of dwellings. 

n.a. not applicable 


4 Composite estimation 


4.1 Simple estimates of a constant TI effect 


Discounting the months-in-survey effect 


Estimates of the TI effect were based on comparing estimates from different 
RGs. Since RG estimates are affected by a months-in-survey effect this needs to 
be removed to avoid affecting the estimates of the TI effect. To this end, the 
formulae in this section will be based on RG estimates y =y,—b/ that subtract 
out the estimate of the months-in-survey effect. 


Level-based estimate 


Suppose the TI effect depends only on the month, so that 7} =7;. An estimate 
of this TI effect based on comparing TI and non-TI groups at a single time point 
will be referred to as the level-based estimate. Let G(#) be the set containing the 
months-in-survey values for RGs using TI at time 4, and let #G(¢) be the number 
of elements of this set. Then the level-based estimate is given by 


7 mJ 1 mJ 
Tey =oa5 Ljecnyt — THe WecnT: . (13) 
Movement-based estimate 


Another estimate can be based on the difference (or movement) between 
successive monthly estimates from a given RG. For RGs having the same sample 
of dwellings at lag 1 there is a high autocorrelation between successive 
estimates, so the movement can have a lower variance than the RG estimates 
themselves. 


To obtain an estimate of TI effect based on the movement, note that in any 
month after July 1996 one of the RGs will have been interviewed using FTF last 
month and using TI that month. The expected value of the movement in this 
RG is thus the true movement Y; — Y;-1 plus the TI effect for that month. So an 
estimate of the TI effect can be obtained by subtracting from this RG'’s movement 
an estimate of the true movement. 


The true movement is estimated by the movement in the other RGs that have 
common dwellings at lag 1. Two versions arise. The first 'monthly' version 
estimates the true movement using the non-TI RGs only. The second 'constant' 
version uses both TI and non-TI RGs — this gives a lower standard error, but is 
only appropriate under that assumption that the TI effect is constant. These two 
movement-based estimators are given by 


7 m2 md i mf me fm 

T Monthly,t = QV; —y re) ~ THE Lec), 230" = 5-4) and (14) 
‘ IGE ies a et 

T constantg = On —Pi-1) = On Jy : CI) 


Table 2 gives level-based and movement-based estimates of the TI effect for each 
month of the phase-in period. Estimates are marked with asterisks to indicate 
whether they are significantly different from zero. 


Table 3: Simple estimates of Tl effect based on a single months data 


Level based and lag 1 movement based estimates of TI effect - 


Proportion unemployed 


Proportion employed 


Level- Movement-based 
based 


Level- Movement-based 
based 


"Monthly" "Constant" "Monthly" "Constant" 


Aug 1996 | 0.01 0.09 0.09 

Sep 1996 | 0.53* —O.71* 0.60* 

Oct 1996 0.26 —0.12 

Nov 1996 —-0.01 0.09 

Dec 1996 0.01 0.14 

Jan 1997 | -0.31  -0.08 

Feb 1997 -0.12 na. —0.27 0.89 


* significant at p<0.1 level, ** significant at p<0.02 level, 
n.a. not applicable na! ete 


The standard errors on these simple estimates are large. The only obvious 
feature is that the TI effect estimates for proportion employed tend to be 
negative. To test this requires combining information across months. 


Combining the estimates 


Suppose the TI effect is constant (ie. T;=7). Then a suitable weighted average 
of the simple estimators of T given above will have a lower variance than the 
individual estimates. The appropriate weighted average to use is that which 
minimises the variance. Other simple estimates based on differences between 
estimates at lag two (or more) can also be devised, and these can be included in 
the weighted average. 


The variance of each simple estimator and of any proposed weighted average 
can be evaluated under the model since the variance-covariance structure of the 
RG estimates is known. The calculations that implement this idea are an 
application of the more general formulation discussed below. 


4.2 Designing a composite estimator 


Variance of a linear combination of RG estimates 


Assume that data from Z+1 months is to be used for the composite estimator. 
Write y and a as the column vectors with elements {j/_,} and {oc_,} respectively 
for 7=1,...,8and/=0,...,£, where the elements of a lie in the interval [-1,1]. 
The general formula for the variance of a linear combination of the RG estimates 
is given by 

var(a’y) = a’var(V Jo. . (16) 
Here var(y), the covariance matrix of the RG estimates, is known from the 


variance and covariances given in table 2. This matrix calculation can be used to 
derive the variance of the simple estimators described in Section 4.1. 


A composite estimator of constant TI effect 


The expected value of a linear combination of the y under the medel (1) is 
E(aP) = Dio {¥ aie ol) + ieGlt~l) ipeeeee (17) 
(for G defined as in 4.1) since pay b/ = 0 (by definition) and E(e) =). 


For the linear combination a’ to be an unbiased estimator of 7 the following 
constraints must be placed on the choice of a. 


ye =O for/=0,...,£, and (18) 
wes ieGtt-A oy =1. (19) 
Under these constraints, E(a’y)=7 under the model with constant TI effect. 


The optimal composite estimator based on this data will be the choice of a that 
minimises the variance (16) under these constraints. 


The optimum choice of a was obtained using standard results for minimisation 
of a quadratic form under linear constraints (see for example Rao (1973) p. 65). 
Writing the constraints in the form C’a =c’and setting V=var(a@’/y); thé optimal 
a is given by a* =V7!CQvc, for Q~ any generalised inverse of (C’V"!C). 


Composite estimators of parameters in other Tl models 


A similar approach was applied to the other models for the TI effect to give 
composite estimators for the various parameters in these models. The process 
was to substitute the appropriate TI effect model into the expectation formula 
(17) and then to minimise variance (16) under constraints that ensure that the 
expectation equals the parameter being estimated. 


For example, for the two level model Mz under the constraints (18) the 
expectation of an estimator a’y for t corresponding to February 97 and L=7 is 
given by 


E(a’¥) = Tra Dpes(Ljece-y G1) + Trina Zico(Djece-y C1) (20) 
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So a minimum variance estimate of Tywa, is obtained by minimising variance 
Ue subject to the constraints (18) and the additional constraints 
pa 3(ZyjeGu- o))= QO and Vide ceen es p)=l. The resulting composite 
estimators under the various models of TI effect are presented in table 4. 


Table 4: Composite estimators under different models of Ti effect, 
proportion unemployed, and proportion employed by sex 


Model of Proportion Proportion employed 
unemployed Male Female Total 
Mo Constant to Feb 1997 : —0.66** : —0.38** 
Mi Monthly: Aug 1996 —1.07** —0.51 

Sep 1996 ~0.71* —0.46* 
Oct 1996 ~1.34** —0.64** 
Nov 1996 —0.83** —0.69** 
Dec 1996 —0.23 ~0.30 
Jan 1997 : —0.47 ; 0.06 
Feb 1997 1.01 0.71 
M2 Two level: Tra 01 -0.19 ; ~0.06 
Two level: Tptrr : —0.78** . —0.51* 
Two level: Tinrmat : -0.97** : -0.57* 
M3 Transitory effect ‘ —0.97** : —0.60** 
* significant at p<0.1 level, ** significant at p<0.02 level 


The effects on proportion employed for males are clear, with a consequent effect 
on the total proportion employed. 


4.3. Nonparametric confidence intervals 


Each of the estimators described above is simply a linear combination of the RG 
estimates at one or more successive time points. Applying the estimator to every 
such set of time points before July 1996 gives a number of (autocorrelated) 
values, each of which has the same distribution as the final composite estimator, 
under the model and assuming no TI effect. 


The empirical distribution of these values can therefore be eal to estimate > the 
distribution of the estimator assuming there is no TI effect. This leads to a 
non-parametric confidence interval for the estimate of the TI effect and a 
non-parametric significance test of whether TI effect is 0. 


The non-parametric confidence intervals for the statistics have a high variability, 
as not very many observed values are available (one for each time point), and 
these observations are quite highly correlated over time. However, on average 
over the various parameters estimated the confidence intervals appear to be only 
slightly larger than the (parametric) confidence intervals predicted from the 
estimated variance and autocorrelations. The parametric estimates of standard 
error appear to be 1 or 2 per cent too small, and this should be kept in mind 
when considering tests of significance that give borderline results. 
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5 Fitting a time series model 


5.1 State space models 


In a state space model, the observed data at a time point depends on a vector of 
unobserved true states — the relationship is given by the observation equation. 
The unobserved states are modelled as a Markov Chain, with states at one time 
depending on those at the previous time, as given by the state equation. The 
dependencies between the state vectors are used to model the autocorrelation 
between the observed values at successive time points. 


A linear Gaussian state space model has the observed data and the new state 
vector as a linear function of the previous state vector plus a linear function of a 
vector of iid. N(O, 1) variables known as innovations. For such a state space 
model the least squares estimates of the states at each time point are obtained 
_ by using the smoothed Kalman filter. In this study the diffuse Kalman filter of de 
Jong (1990) was used. 


Advantages of state space modelling 


A state space model is a very flexible model for investigating time series data. It 
is a natural model to use when there is interest in the unobserved components 
of a time series, or when some parameters may change over time. A change of 
model at any time point can be easily handled — for instance, to deal with 
telephone interviewing effects of varying kinds. It is also possible to deal 
automatically with outliers in the model by using innovations that are a mixture 
of Gaussians - see Bell and Carolan (1998) for details. 


5.2 A state space model for sampling error 


This paper uses the state space model for sampling error introduced in Bell and 
Carolan (1998). The model decomposes the sampling error e into three parts: 


= B+ Pi tly. (21) 


Bi is a component which has a very high autocorrelation 9g with the same 
component from the same RG in the previous month. This component is 
responsible for the long term autocorrelation within a RG. 


Phisa component which has high autocorrelation 9p with the same component 
from the same RG in the previous month, except when the dwellings surveyed 
have changed between the two months. This component is responsible for the 
autocorrelation due to a common sample of dwellings between months. 


U, isa component of the sampling error with no autocorrelation. 


The proportion of the variance of the sampling errors that is explained by the 
component U% is given by 77; the proportion explained by component P; is 
given by (1-r7,)7p; and the remaining proportion explained by B’ is given by 
(1-r;)(1-72). Thus the model depends on five parameters,o7, rv, rp, 9p and 
02, the values of which are estimated by methods given in section 3.5. The state 
equations for thé model are as follows: 


1Z 


: mat , V2 
| (803(1-03 )(1-73 (1-73) i fort=1 
Re . yvne: 
4 s(1-05 J(1-rB (1-72) Us, forj=1,t>1 
; . VM 
xB} + (803(1-03 )[ 1-73 ](1-r2 J} Up, forj=2,...,8,¢>1 


. (So3r3(1 =r}, ))"0e forj=lort=1 
P; = sy v2, 
OpP_, + (80a(1 - 08 al -ri,)) Up for j = 2,...,8, ¢>1 
. v2, 
Ui = (sozr?, uy, forj=1,...,8 (22) 


The innovations in the model, 1s,,2¢p,,W¢y, are assumed to be iid N(0,1). The 
multipliers applied to the innovations ensure that var(¢;) = 803. 


A new sample of geographic areas was introduced over the last four months of 

1992. This leads to zero autocorrelation for particular combinations of / and ¢. | 
In the state space modelling approach this is accounted for by using the 

equations applicable to t=1 at the time when the new sample was introduced to 

a RG. 


5.3 The overall state space model 


Modelling the true values 


To complete the state space model requires a model for the dependencies in the 
true series Y;. It is usual to model such dependencies using a decomposition 
into trend, seasonal and irregular, or as an ARIMA model. This was not done in 
this study, except as a subsidiary experiment (see section 6.6). Instead, a simpler 
approach (similar to that of composite estimation) was used in which the true 
values are effectively unrelated from time to time. The state equation associated 
with this is given by 

Y; =Yp1+Tyuy . (23) 
for uy, ~i.i.d N(O,1) and ty large (eg. ty = 100 for data expressed in percentage 
points). This model does not borrow any strength from the likely relationship 
between true values across time, and so may result in estimates of the TI effect 


with somewhat higher standard errors than would have been obtained by 
assuming a more sophisticated model for the true values. 


The state space model 


The overall state space model is given by the equations (1), (21), (22) and (23), 
and can be written out as 


Vi =Vi +4 T+ Bi + Pit Ur oe 


The state vector at time ¢ contains Y; with state equation (23) and B) and P, with 
state equations (22), The U;, appear only in the observation equation and do not 
require state equations. 

The months-in-survey effects b/ are treated as regression parameters for j=1,...7 
and the restriction 68 = -Z;., b/ is imposed to force the month in survey effects 


13 


to add to 0. This is straightforward using the diffuse Kalman filter. The 
parameters in the TI effect model could be treated as regression parameters, but 
were actually implemented by introducing an extra state for each parameter. For 
the constant TI effect model this simply means including the state 7; in the state 
vector and adding the state equation 7; = 7;-; to the set of state equations. 


Initialising the states 


To complete the definition of the model, ready for estimation using the Kalman 
filter, initial distributions (i.e. before observing any data) for the regression 
parameters and for the states at time t=1 must be specified. These distributions 
are known as priors. The regression parameters were given diffuse priors, 
corresponding to total uncertainty about the intial value. This situation is 
handled within the diffuse Kalman filter (using the limit as the variance of the 
prior tends to infinity). The true value and TI effect states could be given diffuse 
priors, but it was computationally cheaper to give the initial values for these a 
normal distribution with large variance. 


The components of the sampling error BY and P; have a zero expectation and a 
known variance as given in (22) for time ¢=1. This gives a prior distribution to 
be used for these states. A diffuse prior for these components is not appropriate 
and could give unrealistic results. 


5.4 State space model estimates of constant Tl effect” 


- 


Table 5 presents estimates of constant TI effect based on the state space 
modelling approach. 


Table 5: State space modei estimators under different models of Tl effect, 
proportion unemployed, and proportion employed by sex 


Model of 
TI effect 


Mo Constant to February 0.09 


Proportion Proportion employed 
Total 


—0.61** —0.05 —0.35* 


unemployed Male Female 


M; Monthly: Aug 1996 | -0.01 -1.04** 0.07 ~0.49 
Sep 1996 0.54** ~0.66* -0.24 ~0.44 
Oct 1996 019  -128** = 0.02, -0.61** 
Nov 1996 0.03 0.80%" 0.53" —0.67** 
Dec 1996 0.09 ~0.19 ~0.30 =0,27 
Jan 1997 | -0.17 ~0.44 0.61 0.07 
Feb 1997 ~0.12 1.12* 0.48 0.78 
M2 Two level: Trat -0.01 ~0.14 0.15 ~0.04 
Two level: 7 pirr 0.18 —0.78** —0.33 —0.50* 
Two level: Tinrmar 0.17 —0.92** -0.18 ~0.54* 
M3 Transitory effect 0.19 —0.93** —0.25 —0.58** 


* significant at. p<0.1 level, ** significant at p<0.02 level 


Note that the estimates are very similar to those for composite estimation. The 
sampling errors are comparable also, tending to be a little lower. This is a useful 
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check on the validity of the analyses. The results of the analyses are discussed in 
the next section. 


6 __ The telephone interviewing investigation 


6.1 Investigation of impact of Tl on estimates 


The results in this section follow the investigation of the TI effect over the period 
of the phase-in of telephone interviewing. An investigation was conducted every 
month from August 1996 to February 1997, with methods being refined each 
month as results opened up new questions. The key method used was the state 
space modelling approach, and numbers presented in this section are from that 
approach. Composite estimates and the simple _level- based and 
movement-based estimates were also examined each month to check the results. 
Plots showing the rotation group estimates by time were also important in 
explaining the cause of particular effects and in looking for patterns. 


The results below are given using all the data up to February 1997. This makes 
for a straightforward presentation, but hides somewhat the full variety of 
investigations conducted over the phase-in period. Each month the 
investigations, including new models prompted by the new data, were 
completed by the day after the data became available. This underlines the easy 
application of the modelling techniques used once the initial investment in 
implementing the methods has been made. 


6.2 Detecting a TI effect 


Early in the phase-in period there was little data available from TI RGs, and the 
focus of investigation was on establishing if the TI method affected any of the 
series. The focus was therefore on the constant TI effect model Mo. 


By the time November 1996 data was analysed it appeared that the TI RGs were 
likely to report somewhat lower employment than the FTF RGs. In the 
December 1996 issue of the Labour Force Australia publication, the ABS issued 
information about the apparent effect of TI on employment estimates, and the 
range of values quoted was based on the model Mo. 


6.3. Behaviour of the TI effect over time 


As more months of data were accumulated, attention turned to whether the TI 
effect was changing over time. The discussion below will focus on the TI effect 
on proportion employed. 


Graph 1 displays a grey bar for each month representing a 95% confidence 
interval for the TI effect on a RG estimate of proportion employed. These are 
based on model M1, which allows the TI effect to vary from month to month, 
using the state space modelling approach. The true TI effect for a month is 
expected to lie in the grey region with 95% probability. 


The grey bars tend to be below zero for the period from August 1996 to 
November 1996, but from December 1996 they do include zero. This, and 
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similar graphs for males only, suggests that the TI effect may have declined since 
November 1996. 


Graph 1: Estimates of effect of telephone interviewing on an RG under TI, 
proportion employed, by month 


95% confidence interval, best monthly estimates 


~-* Final modeled estimates 


Estimated effect (%age points) 


Aug Sep Oct Nov Dec Jan Feb 
Month 


‘Best monthly estimates’ are based on state space modelling for model M1, 
‘Final modelled estimates’ are based on state space modelling for model M3 


6.4 Has the Tl effect changed from December 1996 onwards? 


A model was fitted on data up to February 1997 that proposed a TI effect that 
was constant from August to November 1996 and then constant at a different 
level for December 1996 onwards. Results for this model M2 confirmed that the 
August to November TI effect on employed persons was highly significant. 
Importantly, it also established that the TI effect from December onwards was 
significantly different (at the p<0.02 significance level) from the August to 
November value. The December-onwards level of the TI effect was small and not 
significantly different from zero. 


6.5 Model based on a transitory effect 


There was some concern that the assumption that the TI effect dropped to zero 
within a single month (i.e. between November and December 1996) was 
unrealistic. The transitory effect model M3 was proposed in which the TI effect 
would be constant from August to November and then decrease gradually to 
zero by February, in such a way that the estimated total impact of TI on estimates 
would decrease linearly. This model fitted the data better than the model M2 
which proposed an abrupt change occurring after November. 
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It is considered that this model provides the best summary available of the effect 
of TI over the phase-in period. The estimates of the TI effect for each month 
from this final model are shown in graph 1. These effects apply to each RG 
under TI in the month given — because the number of RGs under TI increased 
through the period, they translate into an overall effect that increased to 
November and then decreased thereafter. 


6.6 Other investigations 


The conclusion from these investigations was that the data is best explained by a 
transitory TI effect that had subsided to near zero by February 1997. In arriving 
at this position, other possibilities were also examined. 


Does the TI effect change with months-in-survey? 


One model investigated was that telephone interviewing did not change over 
time, but instead affected different RGs according to the number of times the 
dwellings had been surveyed. On the basis of data up to December 1996 this . 
model fitted fairly well, with some evidence that the effect was greater for the 
first few times a RG was surveyed. The model did not fit as well as the monthly 
effect model M,. 


Using all the data up to February 1997 it was clear that this model was inferior to 
the model of a transitory TI effect. It is possible that there remains some effect 
of time in survey, but it is too small to be estimated reliably. Many more months 
of data will be required to obtain a good picture of what time in survey effects 
apply under the TI methodology. 


Robustness of the results to outliers 


Another issue was whether outliers in the RG data were producing the 
appearance of a Tl effect. This was addressed in three simple ways. The first was 
to exclude the most influential RG and month estimate, by giving it a separate 
parameter. The second way was to assume that a specific RG had an additional 
impact on the estimates for a series of months. In both these experiments the 
estimated TI effect decreased, but was still quite significant. 


The third way was to examine whether any of the innovations over the period of 
TI introduction were large (defined as over 2.5 for this model where the 
innovations are distributed N(0, 1)). Such values would be highly influential, and. 
would be better treated as outliers or using non-Gaussian state space modelling 
as montioned in Section 5.1. This examination did not identify any outliers 
during the TI phase-in period. 


Decomposing the true value into trend, seasonal and irregular 


The study used a simple model for the true value that gave no penalty to large 
movements between successive months. It was mentioned previously that an 
alternative approach would have been to impose a model for the true value that 
accounts for its trend and seasonality. 


To look at the effects of this, the model for the true value Y; was replaced by a 
state space decomposition into trend, seasonal and irregular as given in Bell and 
Carolan (1998). The model used is known as the Basic Structural Model (BSM) 
with local linear trend, to distinguish it from the ‘free’ model described by 
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equation (23). In the BSM model, the state equation (23) for the true value is 
replaced by the decomposition 


Y; =L1,+S,4+1; (25) 
and state equations for the three components are introduced: 

Ly = 20,1, —-lp2+ titty Trend component, 

Ay = -—Ljr Sry + Tos Seasonal component, 

I; = ly Irregular component. (26) 


for innovations uy,uUsy and uy iid. N(O,1). This approach was applied to the 
analysis of proportion employed for each sex and in total. The variance 
parameters in the decomposition (defining for example how smooth the trend 
is) were set to values (t; =0.018, ts =0.001 andt,;=0.18) suitable for a 
decomposition of the Australian series. 


Results for the transitory effect model M3 based on the free and BSM models of 
the true value are given in table 6. This experiment resulted in the BSM model 
giving estimates of TI effect with standard errors about 10 per cent lower than 
the free model. The cost of these improved SEs is that one has to make more 
assumptions about the structure of the true value. 


Table 6: Estimate and standard error of transitory TI effect 7 rRans, 
proportion employed, by model for true value, by sex 


Free model for Y; BSM model for Y, 
Estimate standard error Estimate Standard error 


males 


females 


total 


Using the BSM model would have given slightly different estimates of the TI 
effect, but the conclusions of the investigation would have been unchanged, 
with both models showing significant TI effects. 


7 Conclusion 


The main conclusion of these investigations is that in repeated surveys it can be 
worthwhile to use models that explicitly account for autocorrelations in the 
survey estimates. This paper has described methods to do this in the context of 
the investigation of telephone interviewing in the Australian Labour Force 
Survey. 


These methods can be applied to the analysis of repeating surveys where overlap 
is controlled and where separate estimates can be generated for groups of 
sampled units that have different rotation histories. In the case presented here 
this was achieved by classifying estimates by LFS rotation group. 


The investigation of telephone interviewing identified a transitory effect on 
estimates by Labour Force status during the period of the phase in. This effect 
appears to have diminished to near zero by the end of the phase-in period. 
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other products can be found in the ABS Catalogue of Publications and Products 
availabie from all ABS Offices. 
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also provides a Subscription Service for standard products and some tailored information 
services. 
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Steadycom P/L: premium rate 25c/20 secs. 
This number gives 24-hour access, 365 days a year, for a range of important economic 
statistics including the CPI. 
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http://www.abs.gov.au 


A wide range of ABS information is available via the Internet, with basic statistics 
available for each State, Territory and Australia. We also have Key National Indicators, 
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